iterative optimization
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
Surfing: Iterative Optimization Over Incrementally Trained Deep Networks
We investigate a sequential optimization procedure to minimize the empirical risk functional $f_{\hat\theta}(x) = \frac{1}{2}\|G_{\hat\theta}(x) - y\|^2$ for certain families of deep networks $G_{\theta}(x)$. The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters $\theta_0$, we show that the objective $f_{\theta_0}(x)$ is ``nice'' and easy to optimize with gradient descent. As learning is carried out, we obtain a sequence of generative networks $x \mapsto G_{\theta_t}(x)$ and associated risk functions $f_{\theta_t}(x)$, where $t$ indicates a stage of stochastic gradient descent during training. Since the parameters of the network do not change by very much in each step, the surface evolves slowly and can be incrementally optimized. The algorithm is formalized and analyzed for a family of expansive networks. We call the procedure {\it surfing} since it rides along the peak of the evolving (negative) empirical risk function, starting from a smooth surface at the beginning of learning and ending with a wavy nonconvex surface after learning is complete. Experiments show how surfing can be used to find the global optimum and for compressed sensing even when direct gradient descent on the final learned network fails.
A Robust and Non-Iterative Tensor Decomposition Method with Automatic Thresholding
Hasegawa, Hiroki, Okada, Yukihiko
Recent advances in IoT and biometric sensing technologies have led to the generation of massive and high-dimensional tensor data, yet achieving accurate and efficient low-rank approximation remains a major challenge. Most existing tensor decomposition methods require predefined ranks and iterative optimization, resulting in high computational costs and dependence on analyst expertise. This study proposes a novel tensor low-rank approximation method that eliminates both prior rank specification and iterative optimization. The method applies statistical singular value hard thresholding to each mode-wise unfolded matrix to automatically extract statistically significant components, effectively reducing noise while preserving the intrinsic structure. Theoretically, the optimal thresholds for each mode are derived from the asymptotic properties of the Marcenko-Pastur distribution. Simulation experiments demonstrate that the proposed method outperforms conventional approaches (HOSVD, HOOI, and Tucker-L2E) in both estimation accuracy and computational efficiency. These results indicate that the proposed approach provides a theoretically grounded, fully automatic, and non-iterative framework for tensor decomposition.
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
VoxelOpt: Voxel-Adaptive Message Passing for Discrete Optimization in Deformable Abdominal CT Registration
Zhang, Hang, Zhang, Yuxi, Wang, Jiazheng, Chen, Xiang, Hu, Renjiu, Tian, Xin, Li, Gaolei, Liu, Min
Recent developments in neural networks have improved de-formable image registration (DIR) by amortizing iterative optimization, enabling fast and accurate DIR results. However, learning-based methods often face challenges with limited training data, large deformations, and tend to underperform compared to iterative approaches when label supervision is unavailable. While iterative methods can achieve higher accuracy in such scenarios, they are considerably slower than learning-based methods. To address these limitations, we propose VoxelOpt, a discrete optimization-based DIR framework that combines the strengths of learning-based and iterative methods to achieve a better balance between registration accuracy and runtime. VoxelOpt uses displacement entropy from local cost volumes to measure displacement signal strength at each voxel, which differs from earlier approaches in three key aspects. First, it introduces voxel-wise adaptive message passing, where voxels with lower entropy receives less influence from their neighbors. Second, it employs a multi-level image pyramid with 27-neighbor cost volumes at each level, avoiding exponential complexity growth. Third, it replaces hand-crafted features or contrastive learning with a pretrained founda-tional segmentation model for feature extraction. In abdominal CT registration, these changes allow VoxelOpt to outperform leading iterative in both efficiency and accuracy, while matching state-of-the-art learning-based methods trained with label supervision.
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States (0.04)
- (3 more...)
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning (1.00)
From Few to Many: Self-Improving Many-Shot Reasoners Through Iterative Optimization and Generation
Wan, Xingchen, Zhou, Han, Sun, Ruoxi, Nakhost, Hootan, Jiang, Ke, Arık, Sercan Ö.
Recent advances in long-context large language models (LLMs) have led to the emerging paradigm of many-shot in-context learning (ICL), where it is observed that scaling many more demonstrating examples beyond the conventional few-shot setup in the context can lead to performance benefits. However, despite its promise, it is unclear what aspects dominate the benefits and whether simply scaling to more examples is the most effective way of improving many-shot ICL. In this work, we first provide an analysis of the factors driving many-shot ICL, and we find that 1) many-shot performance can still be attributed to often a few disproportionately influential examples and 2) identifying such influential examples ("optimize") and using them as demonstrations to regenerate new examples ("generate") can lead to further improvements. Inspired by the findings, we propose BRIDGE, an algorithm that alternates between the optimize step with Bayesian optimization to discover the influential sets of examples and the generate step to reuse this set to expand the reasoning paths of the examples back to the many-shot regime automatically. On Gemini, Claude, and Mistral LLMs of different sizes, we show that BRIDGE to significant improvements across a diverse set of tasks, including symbolic reasoning, numerical reasoning, and code generation.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Europe > Austria > Vienna (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Reviews: Surfing: Iterative Optimization Over Incrementally Trained Deep Networks
The paper proposes a new method for provably fitting deep generative models to observations, a highly non-convex optimization problem. Instead of trying to find the latent code that explains the measurements directly, as proposed by Bora et al. this paper starts with a different deep generative model that has random weights, for which Hand et al. showed that gradient descent provably works. Then they incrementally modify the weights of the generator to approach the true generator while using the previous optimum as a starting point. This sequence of models can be snapshots of the model during the training process. The main result is a theory that shows that a warm-started non convex optimization in expansive Gaussian networks yields successful recovery.
Surfing: Iterative Optimization Over Incrementally Trained Deep Networks
We investigate a sequential optimization procedure to minimize the empirical risk functional f_{\hat\theta}(x) \frac{1}{2}\ G_{\hat\theta}(x) - y\ 2 for certain families of deep networks G_{\theta}(x) . The approach is to optimize a sequence of objective functions that use network parameters obtained during different stages of the training process. When initialized with random parameters \theta_0, we show that the objective f_{\theta_0}(x) is nice'' and easy to optimize with gradient descent. As learning is carried out, we obtain a sequence of generative networks x \mapsto G_{\theta_t}(x) and associated risk functions f_{\theta_t}(x), where t indicates a stage of stochastic gradient descent during training. Since the parameters of the network do not change by very much in each step, the surface evolves slowly and can be incrementally optimized.